Lexicon and Grammar in Probabilistic Tagging of Written English
نویسنده
چکیده
The paper describes the development of software for automatic grammatical ana]ysi$ of unl~'Ui~, unedited English text at the Unit for Compm= Research on the Ev~li~h Language (UCREL) at the Univet~ of Lancaster. The work is ~n'nmtly funded by IBM and carried out in collaboration with colleagues at IBM UK (W'~) and IBM Yorktown Heights. The paper will focus on the lexicon component of the word raging system, the UCREL grammar, the datal~zlks of parsed sentences, and the tools that have been written to support developmem of these comlm~ems. ~ wozk has applications to speech technology, sl~lfing conectim, end other areas of natural lmlguage pngessil~ ~y, our goal is to provide a language model using transin'ca statistics to di.~.nbigu~ al.:mative 1~ for a speech .:a~nicim device. 1. Text Corpora Historically, the use of text corpora to provide mnp/ncal data for tes~g gramm.~e.al theories has been regarded as important to varying degn~es by philologists and linguists of differing pe~msions. The use of co~us citations in ~-~,~ma~ and dictionaries pre~t~ electronic da~a processing (Brown. 1984: 34). While most of the generative 8r~- ,-a,iam of the 60S and 70S ignored corpus ant,,: the inc~tsed power Of the new t~mlogy ,wenlw.l~ points the way to new applications of computerized text cmlxEa in dictiona~ makln~_: style checking and speech w, cognition. Compmer corpora present the computational linguist with the diversity and complexity of real language which is more challenging for testing language models than intuitively derived examples. Ultimately grammatl must be judged by their ability to contend with the teal facts of language and not just basic constructs extrapolated by grammm/ans. 2. Word Tagging The system devised for automatic word tagging or part of speech selection for processing nmn/ng Enfli~ text, known as the Constituent-Likelihood Automatic Word-tagging System (CLAWS) (Garside et aL, 1987) serves as the basis for the current work. The word tagging system is an automated c~mponent of the probabilist/c parsing system we are curnmtly woddng on. In won/tagging, each of the rurmi.$ words in the coqms text to be processed is associated with a pre-termina/ symbol, denoting word class. In e.~enc~ the CLAWS suite can be conceplually divided imo two phases: tag assignment and tag selection.
منابع مشابه
Integrating Probabilistic and Knowledge-based Approaches to Corpus Parsing
We have developed a prototype system for syntactic parsing of corpus text based on a wide-coverage unification-based grammar of English and domain-independent statistical techniques for selecting the most plausible parses from the typically large number licensed by the grammar. Although the results from initial experiments are promising, the system is ‘brittle’, relying particularly on the corr...
متن کاملAn English Grammar Checker as a Writing Aid for Students of English as a Second Language
This paper describes an implemented on-line English grammar checker for students of English as a second language. This system focuses on a limited category of frequently occurring grammatical mistakes in essays written by students in the English Language Programs at the University of X. The grammar checker exploits the syntactic domain of locality from a Combina-tory Categorial Grammar for the ...
متن کاملNE Tagging for Urdu based on Bootstrap POS Learning
Part of Speech (POS) tagging and Named Entity (NE) tagging have become important components of effective text analysis. In this paper, we propose a bootstrapped model that involves four levels of text processing for Urdu. We show that increasing the training data for POS learning by applying bootstrapping techniques improves NE tagging results. Our model overcomes the limitation imposed by the ...
متن کاملComparative Study of Graduate Students’ Self-Perceived Needs for Written Feedback and Supervisors’ Perceptions
This study was an attempt to examine the supervisors’ and graduate students’ needs for written feedback on thesis/dissertation and juxtaposed them to see how each group views feedback. A mixed-method design was employed to collect the data. Questionnaires and interviews were deployed to collect the data from 132 graduate TEFL students and 37 supervisors from 10 Iranian Universities. Results ind...
متن کاملConcept-based Instruction and Teaching English Tense and Aspect to Iranian School Learners
The present study examines the role of Gal’perin’s Concept-based Instruction (CBI) as a pedagogical approach in teaching cognitive grammar-based (CG-based) concepts of tense and aspect to EFL students. Following the sociocultural theory of L2 Acquisition (SCT), arming L2 learners with scientific concepts can lead to L2 development by deepening their understanding and raising awareness of L2 str...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1988